Reinterpretion of Experimental Results with Basis Templates 
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Experimental analysis of data from particle collisions is typically expressed as statistical limits on 
a few benchmark models of particular, often historical, interest. The implications of the data for 
other theoretical models (current or future) may be powerful, but they cannot typically be calculated 
from the published information, except in the simplest case of a single- bin counting experiment. We 
present a novel solution to this long-standing problem by expressing the new model as a linear 
combination of models from published experimental analysis, allowing for the trivial calculation of 
limits on a nearly arbitrary model. We present tests in simple toy experiments, demonstrate self- 
consistency by using published results to reproduce other published results on the same spectrum, 
and provide a reinterpretation of a search for chiral down- type heavy quarks (fo') in terms of a search 
for an exotic heavy quark (T) with similar but distinct phenomenology. We find mr > 419 GeV at 
95% CL, currently the strongest limits if the T quark decays via T — >■ Wb, T ^r tZ and T — >■ tH. 

PACS numbers: 12.60.-i, 14.65. Jk 



INTRODUCTION 

Data from particle colliders may reveal new states of 
matter or evidence for new forms of interactions, or dis- 
prove theories of such new phenomena. When no evi- 
dence of new phenomena is seen, the experimental col- 
laborations who collect and analyze the data communi- 
cate the non-observation in terms of statistical limits on 
theories which predict the new states or interactions. 

The space of possible theoretical models is impossibly 
vast; therefore experimental results are typically commu- 
nicated as limits on a few benchmark models of particular 
or historical interest. However, the data can provide tight 
constraints on many other models not included in the ex- 
perimental analysis - such as models not yet constructed. 
One solution would be for the experiments to provide a 
rapid mechanism for testing new models against previ- 
ously analyzed datasets. Currently, however, the exper- 
iments are not capable of (or perhaps not interested in) 
responding to an exhaustive list of models; furthermore, 
experimental re-analysis typically occurs on a timescale 
of weeks or months rather than hours or days. Some 
ideas have been proposed to smoothen this process, such 
as RECAST [l[, which elegantly connects the individual 
experimental experts to the theoretical models, but does 
not remove the lengthy experimental review process and 
so does not provide a rapid and certain solution - the 
experiment may still choose to not provide limits on the 
requested models. 

To date, it has been impossible for those outside the 
experimental collaborations to reinterpret the published 
results on benchmark models in order to set limits on 
new models, except in a single restrictive case. When ex- 
perimental analysis is performed as a simple selection of 
events (e.g. @, y]), it can be imagined as a counting ex- 
periment and the information needed to derive limits on 



new models is often included in the experimental publica- 
tion. To calculate a statistical upper limit on the number 
of events from a new source, A'c vents, which is compatible 
with the data, all that is needed are the expected con- 
tributions from standard model (SM) backgrounds (with 
uncertainties) and the observed collider event yield. One 
can then calculate an upper limit on the production cross 
section of the new source, Uncw, using 
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where Cseiection is the efficiency of the experimental de- 
tection and selection and C is the integrated luminosity 
of the dataset. Reasonable methods exist for estimating 

^selection Y^\\ • 

Access to only single-bin analysis for reinterpretation 
is an unfortunately restrictive condition, as many of the 
most powerful and important results are those which per- 
form a multi-bin fit to a spectrum, taking advantage of 
distinct signal and background shapes (e.g. [5|-l7[). Use 
of a multi-bin histogram can greatly improve the sensi- 
tivity, but makes reinterpretation difficult, as one needs 
access to the bin-to-bin correlations for all background 
components and each of the systematic correlations [8|. 
Multi-bin data may be reinterpreted directly in terms of 
new theoretical models but these rely on simple detec- 
tor simulation tools which may not accurately describe 
the reconstructed variable. It is also difficult to describe 
all the systematic uncertainties, especially in a multi-bin 
analysis where a systematic uncertainty may also distort 
the shape of the template. Such an approach is suited for 
coniparing features in the data against specific new mod- 
els [9| but cannot be reliably used to to calculate realistic 
limits in the manner of the experimental collaborations. 
The result is that a large fraction of published results are 
unusable for reinterpretation. 
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FIG. 1: Upper limits on the production cross section of a 
new signal hypothesis can be derived if the new signal can be 
constructed as a linear combination of signals with existing 
experimental limits. 



A solution to this problem would be experimental pub- 
lication of all the details of the limit calculation, for ex- 
ample the complete likelihood used to analyze the final 
binned data. This is not currently provided by the ex- 
periments, though there are prospects for future mecha- 
nisms lOl |ll I . 



We present a novel solution to this problem, which al- 
lows for the reinterpretation of published limits derived 
from multi-bin analysis. If the binned distribution of a 
new theoretical model can be expressed as a linear com- 
bination of the models for which published limits exist, a 
simple relationship allows the calculation of the limit on 
the new model as a simple combination of the limits on 
the published models. This allows a large swath of previ- 
ously un-usable experimental results to confront a nearly 
arbitrary space of possible models without involvement 
by the experiment. 

In the following, we introduce "the basis- limit hypoth- 
esis" , demonstrate its effectiveness in simple artificial sce- 
narios, show self-consistency by using published results 
to reproduce other published results on the same spec- 
trum, and provide a reinterpretation of a search for chiral 
down- type heavy quarks (6') in terms of a search for an 
exotic heavy quark (T) with similar but distinct phe- 
nomenology. 



THE BASIS-LIMIT HYPOTHESIS 
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and ei is the selection efficiency for the z-th signal hy- 
pothesis. 

Given a new, untested signal hypothesis template, 
F{x)^ which may be expressed as a linear combination 
oi fi{x), as 



F{x) = Y,a^Mx) {a,>0), 

i=0 

our hypothesis is that an upper limit on the cross-section 
of the new signal hypothesis, tr^™'' can be calculated from 



the limits (crf'^'YCT, 
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associated with each ,fi{x), as 
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Stated briefly, given limits on a set of basis templates, 
we claim to be able to calculate limits on any new signal 
template which can be expressed as a linear combination 
of the basis templates. The idea is expressed diagram- 
matically in Fig[T] 



PERFORMANCE IN TOY EXAMPLES 

We demonstrate the application of the basis-limit hy- 
pothesis using a simple toy scenario: a 3-bin analysis with 
two signal hypotheses, shown in Figure [2] The two signal 
templates are the basis templates and are generated with 
an arbitrary but equal signal cross-section, a^^°°'-"^ 

We use the CLs technique 



121 . |13| to estimate cross- 



section upper limits on each of the basis templates rela- 
tive to a flat background. For a new signal hypothesis, 
F, expressed as a linear combination of the templates, we 
predict the limit on F using the limits on the basis tem- 
plates, as shown in Equation[TJ TableUgives an example 
for a single case. 

A comprehensive test, scanning many possible values 
of Qi, shows that the basis-limit predictions are robust in 
this toy example, see Fig. [31 



For a specific experimental dataset and background 
model which is analyzed using a binned likelihood in 
a variable x, there are a set of n signal hypothe- 
ses described by binned distributions ("templates") 
fi{x),f2{x),---,fn{x)- In the case that the data prefer 
the background model, each signal template has an asso- 
ciated cross-section upper limit, which can be expressed 
relative to the theoretical prediction for that signal hy- 

^limit /^theory 



pothesis: crf^'Vcr' 



where a, '^°'^^ is the cross section 



POSSIBLE SHORTCOMINGS 
Limited basis templates 

The basis-limit hypothesis is not universally robust: it 
cannot be blindly applied to every published result or 
completely arbitrary new models. The primary limita- 
tion is whether enough example shapes are presented to 
form a basis to describe a new test signal. 
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FIG. 2: The example toy basis templates, /i (top) , /2 (center) 
and a linear combination F = 0.5/i + 2/2. We derive a limit 
on the F template using limits on /i , /2 and the coefficients 
ai = 0.5, a2 = 2.0, see Table HI 
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FIG. 3: Tests of the basis-limit hypothesis in a toy scenario 
involving two signal templates, see Figure^ For a new shape 
F = ai/i + 02/2 and a variety of ai,a2 mixtures, we com- 
pare the relative upper limit on the model described by the 
F template as calculated explicitly to that predicted by the 
basis-limit formula, which uses only the limits on /i , /2 and 
the coefficients ai,a2. Top pane shows the explicit and pre- 
dicted upper limits for varying ai, 02; bottom pane shows the 
difference in the explicit and predicted upper limits. Statisti- 
cal uncertainties come from the pseudo-experiments required 
in the limit calculation. 



TABLE I: Demonstration of basis-limit application in a toy 
scenario. A new signal hypothesis, F, is formed from a linear 
combination of /i and /2 using the coefficients ai (see Fig.[2|. 
The limits (o"i'™'*/o"' '^°''^) on the /; and the coefficients ai can 
be used to predict the limit on the new hypothesis, F, using 
Equation [l] The predicted limit is confirmed by an explicit 
calculation of the limit using the F template. 



,/l /2 



F 



ai 
a2 



1.0 



1.0 



0.5 
2.0 



V^i 



theory 
theory 



(meas) 0.27 0.27 0.11 



^l.m.ty^taeory ^^^^^-^ 



0.11 



As an extreme example to illustrate the point, con- 
sider a two-bin analysis with large background rates in 
each bin: fbg{x) = (10'^, 10'^), but large uncertainty: 
Afbg{x) = (10^,10^). A two-bin signal template with 
signal isolated in just one bin, fi{x) — (100, 0), may have 
reasonable sensitivity, as the background uncertainty can 
be reduced by a fit to the observed data in the signal- 



depleted bin. But, if we were to make a poor choice of 
basis templates, in which each bin kept significant signal 
contributions: fi{x) = (101, 99), /2(a;) = (99,101), no 
reduction in the background uncertainty would be possi- 
ble for either signal template, leading to weak limits for 
both. There is no positive set of coefficients a^ which can 
be combined to describe a signal hypothesis with signal 
isolated in one bin; these templates fail to capture the 
real power of the data set. 

The basis-limit hypothesis implicitly assumes that 
the constraints on the background systematics obtained 
when determining limits for each of the basis templates 
is comparable to the constraints that would be obtained 
when determining limits using the new signal. This 
assumption is valid if, as is often the case, the back- 
ground systematics are predominantly constrained in a 
background-rich region where both the basis template 
and new-signal are always small. 

Another extreme example where the basis-limit hy- 
pothesis would fail is the case of a new signal template 
which is identical to a published template, but with larger 



or different systematic uncertainties. 



Approximate Efficiency Calculations 

Any reinterpretation of published data in terms of a 
new theoretical model requires an estimate of the effi- 
ciency, tseiection of the experiment to detect and select 
events from the proposed new source. While the most ac- 
curate estimate of ^selection can only be performed by the 
experimental collaboration, often via use of their private 
official GEANT-based 1J| detector simulation programs, 
there are well-established public tools, such as PGS [J], 
which provide estimates via a parametric simulation with 
reasonable accuracy (5 — 20% relative) in most regimes. 

One application of the basis-limit approach is to build 
templates of the new theory using the available public 
simulation programs and express them as linear combi- 
nations of the published templates produced by the ex- 
periments with their private simulation programs. This 
incurs the same acceptable level of uncertainty in each 
bin as in the well-established single-bin case. The ap- 
proximation of e results in an approximate determination 
of the tti coefficients. 

To calculate the limits on a new theory, the e val- 
ues are not directly needed - all that is required are 
the coefficients a^ and the limits on each template fi. 
If the new theory templates and the basis templates 
are built using the same approximate public simulation, 
many of the approximations may cancel. For example, 

if efpprox/effficiai = eapprox/Eofficiai tl^^u it may be possi- 
ble to calculate the ai without incurring approximations 
due to e. In addition, it is possible with this approach 
to make templates in cases when the published analysis 
has limits quoted for many cases but only a few example 
templates (e.g. |15j). 
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FIG. 4: Distribution in event Ht (the scalar sum of trans- 
verse momenta of jets and leptons) from same-sign top-quark 
pair production (blue, dashed) expressed as a linear combina- 
tion of distributions from gluino-pair production and squark- 
pair production. Panes show show the IlIl (top), tLtR (cen- 
ter) and tatR. (bottom) chirahty configurations. 



TESTS WITH PUBLISHED LIMITS 

The toy scenario above demonstrates the validity of 
the basis-limit hypothesis when the new signal is exactly 
a linear combination of previously examined signals. In 
this section, we demonstrate the use of basis limits in 
realistic scenarios using published experimental results. 

Same-sign dileptons at CDF 

A critical test of the basis-limit hypothesis is a com- 
parison of limits predicted using Equation [1] with limits 
derived by the experiment itself. This requires a pair of 
experimental limits which use identical datasets, selec- 
tions and background models. One such pair is a search 
at CDF in same-sign dileptons with jets and missing en- 
ergy in 6.1 fb~^; the dataset was used to extract lim- 



its on supersymmetry [l6| and same-sign top-quark pair 
production [17l|. Both analyses use Ht, the scalar sum of 
transverse momenta of jets and leptons, as the discrimi- 
nating variable. 

We form linear combinations of the SUSY templates 
to reproduce the same-sign top-quark templates, see Fig- 
ured The coefficients a^ and the published limits on each 
of the SUSY templates can then be used to predict the 
limits on same-sign top-quark production, see Table |lll 
The predicted limits agree with the published results in 
each case. 



Self-consistency test 

Pairs of published experimental results with identi- 
cal selection, dataset and backgrounds but distinct sig- 
nal hypotheses are quite rare. However, we can probe 



TABLE II: Prediction of limits on same-sign top quark pairs 
in various chirality configurations derived from limits on su- 
persymmetric production of gluinos or squarks. Coeffients 
of the supersymmetric particle templates are denoted by the 
particles and mass heirarchy, all in units of GeV. Note that 
coefficients are individually quoted only for the three most 
significant basis templates; the sum of the 30 remaining coef- 
ficients, aothcr is also shown. 

tLtL tLtR tntR 



'ig(m=200),x+(m=75),x''(m=50) 
O'g(m^300),x+ (™.= 150) .x" (m=100) 
O'q(m^200),x+ (™=75),x'' (™=50) 
^Ctothcr 



0.061 0.074 0.098 

0.015 

0.040 0.035 0.011 

0.100 0.068 0.700 



Experiment Results fl7] [fb] 54 51 44 
Our Prediction [fb] 53.7 50.9 44.1 
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FIG. 5: Pair-production of heavy down-type b' quarks with 
decay b' -^ Wt. 



the basis-limit performance in realistic scenarious using 
a self-consistency test. 

In a set of N signal templates, we can attempt to de- 
scribe the i-th template using the other iV — 1 templates. 
Given the published limits on the A^ — 1 templates, we 
can predict the limit on the i-th template and compare 
it to the published limit. 

The ATLAS collaboration reported a search for heavy 
fourth-generation down-type chiral quarks (6') using 1 
fb^^ [l8|. The b' decays via tW, leading to a final state 
with four W bosons and two b quarks, see Fig. [S] The 
ATLAS search makes use of a novel technique for tag- 
ging boosted W bosons by searching for jet pairs with 
small angular separation. The analysis variable is the jet 
multiplicity and W boson multiplicity, see Figure |6l 

We generate b' -j> tW using madgraph ^^, use 
PYTHIA ^20] to model showering and hadronization, and 
PGS to describe the detector response. Details of the con- 
struction of the templates and the resulting limits are 
given in Table Hill 

Figure [7| shows that the basis-limit estimation is reli- 
able for this application. At the lower boundary, mi,/ = 
300 GeV, it is difficult to find coefficients which give an 
accurate description, see Figure [51 



FIG. 6: Jet multiplicity and hadronically-decaying W mul- 
tiplicity for b — >■ tW pair production GeV (blue, dashed) ex- 
pressed as a linear combination (red, solid) of distributions 
from other b' masses, see Table Hill Left is for m,;,/ — 300 GeV; 
right is for mj,/ = 450 GeV. For the edge case of mj/ = 300 
GeV, the sum of basis templates is not exact, due to the lack 
of templates at lower masses. 



TABLE III: Details of the self-consistency test using b' — >■ 
tW decays at ATLAS. The template at each specific mass is 
expressed as a linear sum of templates at other masses; the 
predicted limit is then compared to the explicit calculation. 
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NEW LIMITS ON A HEAVY EXOTIC QUARK: T 



Having shown the validity of the basis-limit hypoth- 
esis, we provide a demonstration of the calculation of 
limits on an untested signal hypothesis using published 
experimental results. 

An exotic heavy quark T may decay as T ^^ Wb, 
T -^ tZ , or T ^ tH, see Fig.JHl The signature of the de- 
cay is similar to that of the b' model, involving hadronic 
decays of boosted bosons (W, Z, H) and top quarks [21 1. 
The CMS collaboration analyzed data with 1.14 fb~^ of 
integrated luminosity and excluded at 95% CL such a T 
quark below tot = 475 GeV assuming BR(T — > tZ = 
100%) [23]. In the more likely configuration with other 
decay modes available and BR(r -^ tZ < 25%) (see 




b' Mass [GeV] 



FIG. 7: Prediction of limits on b' -> tW decays at ATLAS. 
Limits at each mass are predicted by expressing the signal 
model at that mass in terms of signal models at other masses; 
these predicted limits agree well with the limits reported by 
ATLAS d, see Table IHl 
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FIG. 8: Decay modes of a heavy exotic quark, T 
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FIG. 9: Jet multiplicity and hadronically-decaying W multi- 
plicity for exotic heavy quark T pair production and decay 
with rriT = 450 GeV (blue, dashed) expressed as a linear 
combination of distributions from b' — >■ tW model with vary- 
ing masses, see Table ITVl We include all T decay modes, see 
Fig.il 



2l|), the CMS limit would be significantly weaker, per- 
haps rriT > 250 GeV. 

We combine all decay modes together to maximize the 
expected yield and to be sensitive to the broader model. 
As before, we construct templates for T as linear com- 
binations of the existing b' templates (Fig. [3]) and use 
Equation[l]to calculate new limits on the pair-production 
of T at the LHC. 

Table IIVI shows the coefficients and calculated limits, 
also shown in Fig. 1101 Using an approximate next-to- 
next-to-leading-order calculation [24I of the T produc- 
tion cross-section, our cross-section upper limit excludes 
a T quark with mass rriT > 419 GeV, despite the low 
branching ratio, BR(T — > tZ)=15% at this mr- 

In addition, we repeat the study using only T ~¥ tZ 
decay modes, as these are most similar to the b' -^ tW 
mode originally analyzed by ATLAS, see FigfTUl 



LIMITATIONS AND GENERALIZATIONS 

While the basis-limit hypothesis is an intuitive and ef- 
fective construction, it is a heuristic formula. We do not 
provide its derivation from basic statistical axioms. 

In some scenarios, it may fail to provide an accurate 
prediction of the limit on a new model, as mentioned 
above. Indeed, we have observed some artificial scenar- 
ios in which the basis templates have very little overlap 



TABLE IV: Details of the predicted limit on T pair- 
production and decay, using basis templates from b' — >■ tW 
decays at ATLAS. We include all T decay modes, see Fig. [S] 
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that a bias may occur in the prediction. Some corrections 
may be calculable in this scenario, leading to modifica- 
tion of the tti coefficients based on the a priori overlap of 
the basis templates. In most cases, where the signal tem- 
plates come from new physics processes with single slow- 
varying parameter, such as the mass of a new particle, 
the templates have substantial overlap and the correction 
is negligible. 

Alternatively, we might express Eq. [T]in another form, 
as 



a 



limit 



theory 



<T 



XiKi 



where Xi is the bin content of the i-th bin for an n-bin 
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FIG. 10: Upper limits at 95% CL on the production of an 
exotic heavy quark T. Top includes all decays (see Fig. [8|; 
bottom includes only T -^ tZ decays. The theoretical predic- 
tion is at approximate next-to-next-to- leading-order [2^ . 



analysis, and k^ is an unknown constant which depends 
only on the background and data in the i-th bin. Rather 
than expressing the new signal in terms of basis tem- 
plates, we could solve for the k^ given a set on n limits 
on n signal templates. This would allow the calculation 
of a limit on an arbitrary signal template without con- 
cern for an overlap correction as discussed above. We 
leave this for future investigation. 



CONCLUSIONS 

The basis-limit hypothesis provides a tool for reinter- 
pretting the results of experimental analysis using multi- 
bin data. Previously, only single-bin analyses could be 
reinterpreted. 

Some technical hurdles remain; for example, if the pub- 
lished analysis uses a complex technique (such as a multi- 
variate analysis tool) and does not publish enough detail, 
then the selection cannot be reproduced. This also ap- 
plies to a single-bin analysis. 

Superior solutions to the one we propose here are: 

• Archiving and streamlining by the experiments 
of published analysis, allowing for a rapid re- 



interpretation in terms of a new model. This has 
the disadvantage that it places the burden on the 
experiments. 

• Publication by the experiments of all of the details 
necessary to reproduce the analysis. This has the 
disadvantage that it requires use of an approximate 
publicly-available simulation. 

As neither of these are currently available, the basis- 
limit approach makes a wide range of results available for 
constraining current and future models. 

We use this approach to interpret an ATLAS search 
for 6' — T' Wt to set the strongest limit on an exotic heavy 
quark T which decays T -^ tZ, th, Wb at tot > 419 GeV 
at 95% confidence level. 
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